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Abstract 

We develop an improved bound for the approximation error of the 
Nystrom method under the assumption that there is a large eigengap in 
the spectrum of kernel matrix. This is based on the empirical observation 
that the eigengap has a significant impact on the approximation error of 
the Nystrom method. Our approach is based on the concentration in- 
equality of integral operator and the theory of matrix perturbation. Our 
analysis shows that when there is a large eigengap, we can improve the ap- 
proximation error of the Nystrom method from 0{N/m^^^) to 0{N /m}^'^) 
when measured in Frobenius norm, where is the size of the kernel ma- 
trix, and m is the number of sampled columns. 



1 Introduction 

The Nystro m method has been used in kernel learning to approximate large ker- 
nel matrices dFowlkes et al.l.l2004al:|Plattll2004l:lKuniar et"aLl.l2009Hzhaiig et all. 



2008|: | Wimams fc Seegeiil200ll:ICortes et all. 1201 CtlTalwalkar et allbOOstlDrineas k Mahonev 



2005 : lsilva fc Tenenbauml . 120031: iBelabbas fc WolMl2009HTalwalkar fc Rostamizadeh 



20101). In order to evaluate the quality of Nystrom method, we typically bound 
the norm of the difference between the original kernel matrix and the low rank 
approximation created by the Nystrom method. Both the Frobenius norm 
and t he spectral norm have been used to bound the di fference between ma- 
trices ( Drineas fc Mahonevl |2005() . The key result from ( Drineas fc Mahonev , 
2005h is that besides the intrinsic error due to the low rank approximation, 
the additional error caused by the Nystrom method is 0{N/m^^^) when mea- 
sured in Frobenius norm, provided that the diagonal elements of kernel matrix 
is bounded by a constant. In this work, we consider the case when there is a 
large eigengap in the spectrum of the kernel matrix, a scenar i o that has been 
exam i ned in many studies o f kernel learning ( Bach fc Jordan . 20031 iLuxbur j . 



2007t lAzran fc Ghahramani l2006t IShi et al 



2009f) . Given sufficiently large 



eigengap, we are able to improve the bound for the additional approximation 
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Figure 1 : A dditional approximation error \\K— Kr \\f—\\K — Kr \ \ f and eigengap 
\r — A,+i. Both the additional approximation error and eigengap are scaled 
appropriately so that they fall into the same range. 



error caused by the Nystroni method to 0{N/m}^'^) when measured in Frobe- 
nius norm. The key techniqu es used in our a nalysis are the concentration in- 
equality o|_^ntegraJ_0£erator ( Smale fc Zhoul . [2009i ) and matrix perturbation 
theory ([Stewart fc guang Sunl . ll990l) . 

Our paper is structured as follows: in section [51 we demonstrate a dis- 
crepancy between the theoretical and experimental approximation error of the 
Nystrom method that motivates our work to improve the existing bounds. Sec- 
tion [3] introduces the problem formally and proves the bounds. Finally, section 
m concludes the paper. 



2 Background and Motivation 



The Nystrom method was first suggested in (jWilliams fc Seegeii 120011) to im- 



prove the computational efficiency of Gaussian process. It was then adopted 



ing (iFowlkes et al..l2004a 


: Plattl 2004 Kumar et al. 2009: Zhang et al. 


2008 


Talwalkar et all. 2008: Drineas & Mahonev, 2005; Silva & Tenenbaum 




2003 


Cortes et al.l. 2010HBelabbas & Wolfe. 2009: 


Talwalkar & Rostamizadeh 


2010|) 



Several analysis h ave been presented to bound the approximation error by the 
Nyst r om method (lOrineas fc Mahonevl. 20051 : iKumar et all . 120091 : iBelabbas fc Wolfd . 
2009t Talwalkar fc RostamizadehL 2010 ). Most of them are based on the result 
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from ( Drineas &: Mahonev . 2005 ) except for ( Talwalkar &: Rostamizadeh . 2010f) 
whose analysis is limited to low rank kernel matrices and does not apply to the 
general case. 

Let K € R-'^^^ be the kernel matrix to be approximated. Let Kj. be the 
r-rank best approximation of kernel matrix and let be an approximate 
kernel matrix of rank r generated by the Nystrom method. Assume Ki^i < 1 for 
any i £ [N] . Let m be the number of columns uniformly sampled from K used 
to construct Kr- Both Frobenius norm and spectral norm are used to bound 
the difference between K and Kr- We note that it is important to derive the 
approximation error s measured in both norms as they have different implica- 
tions. According to ( Cortes et al. . 2010l) . the approximation error measured in 



spectral norm is closely related to the generalized performance of kernel classi- 
fiers. On the other hand, the approxi mation error measured i n Frobenius norm 
have found applicati ons in kernel PC A Scholkopf et al.l (|l998l ). lo w dimensional 



manifold e mbedding i Belki n fc Nivogil (|2001l ). spectral clustering iFowlkes et ah. 
(|2004bf ) : IChitta et air ( 2011 ). Improving the bound in the Frobenius norm will 



help us better understand the application of the Nystrom method to those do- 

ma ins. 

Drineas fc MahonevI (|2005[ ) shows that with a high probability, we have 



WK-Krh < WK-Krh + O 
\K~Kr\\F < \\K - KJ\f + 



N 
N 



m 



1/4 



(1) 

(2) 



where || • ||2 and |j • \\f stand for the spectral norm and Frobenius norm of 
a matrix, respectively. Compared to the bound in spectral norm in (IT|), the 
bound measured in Frobenius norm is significantly worse in terms of m, with 
the convergence rate of 0(m~^/^). The difference between the two bounds in 
([l} and ([2]) leads to the following question: 

Under what scenario it is possible to improve the convergence rate of the 
bound in Frobenius norm to that of the bound measured in the spectral norm. 

To this end, we first examine empirically the additional approximation error 
\\K — KtWf — \\K — Kr\\F- Note that we intentionally remove \\K — i^^rllf 
from the approximation error because IjiiT — i^r||F provides the lower bound 
for any approximation with matrix of rank r. Four UCI datasets are used 
in this empirical study, i.e., MNIStQ, a7a, diabetefQ, CPlfl. The RBF kernel 
k(x,x') = exp(— A||x — x'IIj/c?^) is used, where is the average distance square 
between any two examples and A = 10. The blue curves with legend o in 
Figure [T] show how the additional approximation error \\K — Kr \\f— \\K — Kr \ \ f 
varies according to the rank r. The overall trend, as indicated in Figure [1] is 
that the higher the rank, the larger the additional approximation error tends to 
be. In order to explain the dependence of the approximation error on rank, we 



^http: / /yann. lecun. com/exdb/mnist/ 

■^http: //www. csie .ntu. edu. tw/~ c j lin/li bsvmtools | 

^ttp: //archive ■ Ics .ucl ■ edu/ml/datasets/| 
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examine the distribution of eigengap — A,.+i over the rank. The red curves 
with legend o in Figure [1] show how the eigengap — A^+i varies over the 
rank. Overah, we observe that the larger the rank, the smaller the eigengap. By 
combining the two observations, we conjecture that there is a strong dependence 
between the eigengap and the approximation error of the Nystrom method. This 
motivates us to develop an eigengap dependent approximation error bound for 
the Nystrom method. Our analysis show that when the eigengap Xr — Xr+i is 
sufficiently large, the approximation error of the Nystrom method, measured in 
Frobenius norm, can be improved to 0(N/-^/m), i.e. 



\K - Kr 



< \\K -Kr 



o 



N 



We note that althoug h the concept of eigengap has been exploited in many 

studies of kerne l learning ( Bach fc Jordan , 2003 ; Luxburgl 2007 ; Azran fc Ghahramani 
200(il : IShi et aD . l2n09^ . to the best of our knowledge, this is the first time it has 
been incorporated in the analysis of the Nystrom method. 

In the development of the Nystrom method, another important issue is 
how to sample the columns in the kernel matrix. We restrict our analysis 
to the uniform sampling. Although different sampling approaches have been 



suggested for the Nystrom method (iDrineas fc Mahonev . 2005 



200S ; IZhang et al.l . l2008l : iBelabbas fc Wolfd . 120091 ) . according to 



Kumar et al 



Kumar et al 



2009D, for real- world datasets, uniform sampling seems to be the most efficient 
and give s comparable perfo r manc e to the other sampling approaches. We notice 
that in ( Belabbas fc Wolfd . l2009h . the authors show a significantly better ap- 
proximation bound for the Nystrom method, both theoretically and empirically, 
when sampling the columns based on the determinant of the submatrix formed 
by the sel ected column s and rows, which is also referred to as determinantal 
processes (|Hough et al.l . l2006l) . It is however important to point out that the 



determinantal process is usually computationally expensive as it requires com- 
puting the determinant of the submatrix for the selected columns/rows, making 
it unsuitable for the case when a large number of columns are needed to be 
sampled. 



3 Approximation Error Bound by the Nystrom 
Method 

Let T> — {xi, . . . ,XAr} be a collection of iV samples, and K = [^(xi, Xj)]jvxAf 
be the kernel matrix for the samples in I?, where •) is a kernel function. 
For simplicity, we assume k(x, x) < 1 for any x e A". Let be the Repro- 
ducing Kernel Hilbert Space (RKHS) endowed with kernel We denote 
by (vi, Ai),i = 1, . . . , the eigenvectors and eigenvalues of K ranked in the 
descending order of eigenvalues. Define V = (vi,--- ,VAr) and Vi^j = \V]i^j. 
In order to build the low rank approximation of kernel matrix K of rank 
r, the Nystrom method first samples m < N examples randomly from T), 
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denoted hy V = {xi,...,Xm}. It then computes a sample kernel matrix 
K = [K(xi, Xj)]mxm- Let {ui,Xi),i ~ l,...,r be the first r eigenvalues and 
eigenvectors of matrix K, and let U — (ui, • • • ,Ur), C/ij- — \U\i,i- We assume 
Ar > is strictly positive and define matrix W as 



The approximate low rank matrix K^., computed by the Nystrom method, is 
given by 

where K\, = [^(xi, Xj)]jvxm measures the similarity between the samples in 
2? and D. As already mentioned, we focus on the scenario when the eigengap 
A, — Ar+i is suSiciently large0. Our analysis is mainly based on the concentration 
inequali ty of integral operator dSmale fc Zhou . 2009t ) and matrix perturbation 
theory ( Stewart fc guang Sunl . Il990l) . 

3.1 Preliminaries 

We define an integral operator Ln and L,n based on the samples in 2? and 2?, 
respectively, as 



^Jv[/](-) = ^E«(x.,-)/(xO, 

^ rn 

im[/](-) = — V k(x,, ^/(Xj), 



m 

i=l 



where / G "H^ is any fu nction in 7^^,. The eige nvalues of the integral operator ijv 
and Lm, according to ( Smale fc Zhoul . [20091) . are Xi/N,i € [N] and Xi/m,i £ 
[m], respectively. Let ...,(/? at (•) be the corresponding eigenfunctions of 

Ljv that are normalized by functional norm, i.e ., {ipi, ^j)n^ — ^ihj), V(i, j') € 
[N] X [N]. According to female fc Zhoul . l2009l) . the eigenfunctions are given by 

1 ^ 

ip.i-) ^ Y.v,^,Ki^,,-),j e [N]. (3) 
Using the eigenfunctions expressed in ([3|), we can write K(xj, •), j g [N] as 

N N 



4=1 



The precise definition of large eigengap will be given later 



5 



It is easy to verify that L^r can be written in the base oi ipi,i € [N] by 



1 ^ 



(5) 



Let ifj , j g [m] be the corresponding eigenvectors of the integral operator 
L„i. Similar to Ljv, the eigenfunction ipi is given by 



1 

^j(-) = •)■ (6) 

We define the Hilbert Schmidt norm of operator L : Hk — ^ Hk by 

(7) 



HS 



N 



Let ||i|j2 denote the spectral norm of operator L defined by 

l|i||2 = „ max 

where (•, denotes the inner product in Hilbert space T-Lk.. In the sequel, we 
use (•, •) for short. 

We state the concentration inequality about the two integral operators in 
the following. 

Lemma 1. (Proposition 1 i Smale & Zhoi\ . \200^ ) ) Let ^ be a random variable 
on {X,Px) with values in a Hilbert space \\ ■ ||). Assume ||^|| < M < oo 
almost sure. Then with a probability at least 1—5, we have 



;J-Ee(x.)-EK] 



i=i 



< 



4Mln(2/(5) 



m 



Theorem 1. With a probability 1 ~ S, we have 

4\n{2/S) 



\Ln — L„i\\hs 



< 



m 



where \\L\\hs is defined in equation 

Proof. Define ^(xi) as a rank one linear operator, i.e., 

e(x.)[/](-)=A^(X.,-)/(x.). 

Apparently, Lm = ^ I]i!LiC(xi) and E[^(xi)] = Ljv. Let || • \\hs be the norm 
used in Lemma 1. We complete the proof by using the result from Lemma 1 
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and the fact 



IIC(xfc)|| 



HS 



N 



N 



, ^ (Pi{9.k)^ipj{9.k)^ K(xfe,xfe) < 1, 



where the last equahty follows equation (g)). 



□ 



3.2 Bounding the Approximation Error by Operator Norm 

Based on the first r eigenfunctions of Ljv and L,„, we define two additional 
linear operators Hr and Hr as 

r 
r 

2=1 

The following lemma relates Hr and Hr to matrices Kr and Kr, respectively. 
Proposition 1. Assume Xr > and Xr > 0. We have for any E [N] x [N] 



Kr 



= {l^{^i,-),HrK{Xj,-)), 



[Kr],^^ = (K(Xj,-),i?rK(Xj,-)). 

Proof. By the definition of Hr and equation , we have 

(k(x,, ■),HrK{Xj, •)) 
1 



k=l 
m r 



a,b=l fc=l 



m r 



,,6=1 fe=l 



a,&=l 



Using the fact that Kr = KWK, where 
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we apply the same proof to Kr- □ 

Next, we will relate \\Kr — K^Wf to Aff = — H^- Note that L]y and 
Hr,Hr are self-adjoint operators, and so is AH. In the proof of Theorem [5J we 
repeatedly use 

{f,AHg) = {AHf,g), 

N 

7VL^(/) =^^(x„.)(/,«:(x„-)). 

1=1 

Theorem 2. Assume Xr > and Xr > 0. We have 

\\kr -Kr\\F< ^/xJ^WAHh < iV|| AFII2. 

The proof can be found in the Appendix A. As indicated by Theorem [21 to 
bound \\Kr — KrWr, the key is to bound the spectral norm of operator AH. 

3.3 Bounding the Operator Norm by Matrix Perturbation 
Theory 

Our next goal is to bound the spectral norm of AH. To this end, we assume 
a large eigengap between and A^+i, i.e., A = {Xr — Xr+i)/N is sufficiently 
large. Note that we normalize — A^+i by N, the size of dataset T), when 
defining A. Ei gengap has the key quantity for the application of matrix pertur- 
batio n theory (Ste wart fc guang Sunl. Il990l ). The following perturbation result 



from ( Stewart fc guang Sun, 1990) forms the foundation of our analysis 0. 

Theorem 3. (Theorem 2.7 of Chapter 6 IStewart & guana Suri \l99d) ) Let 

{Xi,Vi),i G [n] be the eigenvalues and eigenvectors of a symmetric matrix A £ 
K."^" ranked in the descending order of eigenvalues. Set X = (vi, . . . , v^) and 
Y = (vr+i, . . . , v„). Given a symmetric perturbation matrix E, let 

E^{X^YyE{X,Y)=(f' fO. 

Let II • II represent a consistent family of norms and set 

7 - ||i?2i||,'5 = A, - A,+i - ll^nll - 11^2211 

If S > and 27 < 6, then there exists a unique matrix P G m("~'')^'" satisfying 
\\P\\ < ^ such that 

X' = {X + YP)(I + P^P)-^/^, 
Y' = {Y - XP^){I + PP'^)-^/^, 

are the eigenvectors of A + E. 

^We simplify the statement to make it better fit with our objective 
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Define 

The following theorem allows us to relate 8 with $ and <f>. 
Theorem 4. Assume 

A = ^^^^^>3\\Lj,-L„,\\hs. 

Then, there exists a matrix P e K(^^'")^'' satisfying 

II pii ^ 2||£jv — LmWns 3||Ljv - LnJIgg 
" A-||L^-L™|Us - A 

SMc/i that 

e = ($ + $P)(/ + p^p)-i/^ 

The proof can be found in Appendix B. As indicated by Theorem 31 when 
the eigengap A is sufficiently large, we have a small and therefore 9 « $, 

implying that the eigenfunctions computed based on the samples in 

2?, are good approximation of the eigenfunctions oi Ljy. As a result, 

when the eigengap A is sufficiently large, we expect a small difference between 
Hr and Hr because they are constructed based on eigenfunctions and 
{0i}i=i-i respectively. This is shown in the next theorem. 

Theorem 5. Assume 

A=^^—^>3\\L^^L^\\hs- 

We have 

"^"""^ - A-||L^.-L„||,, ^ A ■ 

The proof can be found in Appendix C. By putting the results from Theorem 
[U [2] and \5\ we have the final theorem for the approximation of the Nystrom 
method measured in Frobenious norm. 



Theorem 6. As 



We have 



A = ^^-^^>3\\L^-L^\\hs- 



^ ^ 4N\\Lm-L^\\hs ^ 6N\\Ln-L,^\\hs 
l-t^r ~ -t^rW F ^ —. r, r, ^ : ■ 
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If the eigengap satisfies 
then, with a probability 1 ~ S, we have 



\Kr - Kr\\F < O 



N 



Proof. The proof is simply the combination of the resuhs from Theorem [TJ [2] 
and [SI 

\\Kr - Kr\\F < N\\AHh < ^^"^^ ' 



A- \\Lm - L. 



N — i^mWHS 



~ A ~ 



where the third inequahty fohows \\Ln — LmWiis < A/3 and the last inequality 
follows from Theorem[51 Note that both conditions > and > 0, specified 
in Theorem [21 hold with a high probability. It is obvious that A^ > because 
Xr > Xr+i and A^+i > . To show A,- > ho l ds wit h a high probability, we use 
the Lidskii's inequality ( Koltchinskii fc Gind . 20001 ) . i.e., 



A^ > A^-iV|lLAr-i„,|l 



HS- 



Since with a probability 1 — (5, A,. — A,,+i > 3N\\Lpj — L^Whs holds, we have, 
with a probability 1 — 6 

'\ ^ \ Xr — Xr+l _ 2 1 

Ar > — —A,. + 3 '■+! 

□ 



Remark Besides the improved bound for the Nystrom method. Theorem [6] 
also explains the results shown in Figure [TJ Since the additional approximation 
error \\K — KrWr — \\K — KrWr is upper bounded by \\Kr — Kr\\F, according 
to Theorem [SJ we would expect the additional approximation error bound to 
be inversely related to the eigengap A,- — Ar+i, i.e. the larger the eigengap, the 
smaller the additional approximation error. 



4 Conclusion 

In this paper we tried to bridge the gap between effectiveness of Nystrom method 
in practice and its poor theoretical approximation error bounds. In particular, 
in the case of large eigengap, we developed an improved bound for the approx- 
imation error of the Nystrom method, based on the concentration inequality 
and the theory of matrix perturbation. In the future, we plan to develop better 
bounds for the Nystrom method that take into account the eigenvalues of kernel 
matrix which follow a power law. 



10 



Appendix A: Proof of Theorem [2] 

Since — KWK and Kr = KbWK^ , we have 
\\KWK-KbWKj\\l 

N 
AT 
AT 

= J2 (Ai/K(x„-),K(x,,-))(Ai/K(x„-),K(Xj,-)). 

Using the fact >•))(/, k(xj, •)) = N{f,LNf), we have 

= iV^ (AiJK(x„ ■),LnAHk{k,, •)) 



i=l 



iV^(K(x„-),Ai/LjvAi/At(x„-)). 



We further simphfy the expression by using the fact that for any hnear operator 
we have 



N N 



^(At(x,, ■),Zk{x^, •)) = N^iip^, {ZLn)ip,). 

i=l i=l 

Using the above resuh with Z = AHLnAH, we have 
\\KWK-KbWK^\\l 

N 



i=l 
N 



= nY^M^'^AAHLnAH)^,) 

i=l 

N 

<NMY.^AH^,,LnAH^,) 



i=l 

N N 
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where the last one equahty follows equation Define a matrix A = [{ipi, AHipj)]]yxN 
and D = diag(Ai, . . . , Xn)- We have 

\\KWK - KbWK^Wj, < XitT{ADA) 

N 

< Ai||A||^^A.< AiiV||Ai/||i 
where the last step follows from IIAII2 = ||A_ff|l2. 



Appendix B: Proof of Theorem [4] 



Define matrix B as 



m 



k=l 

Let Zi be the eigenvector of B corresponding to eigenvalue \i/m. It is straight- 
forward to show that 

and therefore we have 

N 



^z,,kipk,i &[m], ore = (4',$)Z, 



fe=i 



where Z = (zi, • • • , z^). To decide the relationship between {^i}r=i ^"^^ {Vi}iLn 
we need to determine matrix Z. We define matrix D = diag(Ai/A^, . . . , X^/N) 
and matrix E = B — D, i.e. 

Eij = Bij - XiSij/N = {ifi, {Lm - Ln)ip3)-h^- 

Following the notation of Theorem [31 we define X — {ei, . . . ,er) and Y = 
{cr+i, . ■ . , e^v), where ei, . . . , e^r are the canonical bases of R^, which are also 
eigenvectors of D. Define S and 7 as follows 



7 = 



N 



\ 2=1 j=r+l 



. X (-^^ ~ ^™)<^j)«. 



\ 



AT 



i.j—r-\-l 
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It is easy to verify that 7, 5 are defined with respect to the Frobenius norm of 
E in Theorem [3l In order to apply the result in Theorem [Sj we need to show 
S > and 7 < (5/2. To this end, we need to provide the lower and upper bounds 
for 7 and S, respectively. We first bound S as 



S-A > 



N 



\ '1-1 = 1 



— \ HS- 



We then bound 7 as 



N 



\ i=l j=r+l 



< 



N N 



= \\Ln — L,n\\HS- 

Hence, when A > 3||iAr — Lm\\HS, we have ^ > 27 > 0, which satisfies the 
condition specified in Theorem [3l Thus, according to Theorem |3l there exists a 
P € RiN-r)xr satisfying ||P|1 < 2j/S, such that 



implying 



Z = (zi, . . . , z,) = iX + YP){I + P^P)-i/2^ 



Appendix C: Proof of Theorem [5] 

To bound ||Ai/||2, it is sufficient to bound niax||y||^ <i(/, A_ff/). Consider 
any function /(•) = with WfWu^ < 1- Let f = (/i, . . . , /at)"^. 

Evidently, we have ||f||2 < 1- We have 



r N 



i— 1 a,b—l 
r N 

{f,Hrf) = E E fafb{'Pa,'Pi){'Pb,Vi) 



i— 1 a,b—l 



II Q Q I I. 
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where A = [{(p^, (pj)'HjNxm = $)^0. Since A > 3\\Ln - Lm\\HS, according 
to Theorem 4, there exists an matrix P G M^-'^"'')^'" satisfying 

II DM ^ - Lj^Whs 



\\Ln — LmllHS 

such that 

e = ($ + $p) (7 + p"^p)-V2. 

Using the expression of 9, we compute A as 



Thus, we have 

(/,Ai?/) 
where C is given by 

C = 



I l]-AAAi = f^Cf, 



I~{I + P^P)-^ {I + P^P)-^P^ 
P{I + P'^P)-^ -P{I + P^P)-^P-^ 

{I + p^ p)-^p^ p {i + p^py^p'^ 
p{i + p'^p)-'^ -p{i + p'^p)-'^p'^ 

Rewrite f = (fa, fb) where fa G M.^ includes the first r entries in f and f^ includes 
the rest of the entries in f . We have 

f^Cf < (llfall^ + Ml) \\PiI + P'^PyP'^h 
+ 2l|fa||l|f,||||(J + P^P)^lp||^ 
<(||fa||2 + ||f6||2)^- 

max (||P(7 + P^P)-ip^||2 , ||(/ + P^P)-1P^||2) 
< 2max(||P(7 + P"^P)-ip"^||2,||(/ + P"^P)-^P"^||2) • 
Since ||P||2 < ||P||f < 1 because A > 3||LAr - Lm\\HS and 

||P(J + P^P)-ipT|| < ||P|i2 



we have 



2 - ll-" 112; 
(/ + PTP)-1PT|| < ||P||2, 



ma^ (/,A7f/) = f^Cf 

<2||P||2<2||P||^< ^\\^^-Lrn\\HS 



A-\\Ln - Lm\\Hs' 
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